Method and system for processing a task with robustness to missing input information

ABSTRACT

A unit is disclosed for generating combined feature maps in accordance with a processing task to be performed, the unit comprising a feature map generating unit for receiving more than one modality and for generating more than one corresponding feature map using more than one corresponding transformation; wherein the generating of each of the more than one corresponding feature map is performed by applying a given corresponding transformation on a given corresponding modality, wherein the more than one corresponding transformation is generated following an initial training performed in accordance with the processing task to be performed and a combining unit for selecting and combining the corresponding more than one feature map generated by the feature map generating unit in accordance with at least one combining operation and for providing at least one corresponding combined feature map; wherein the combining unit is operating in accordance with the processing task to be performed and the combining operation reduces each corresponding numeric value of each of the more than one feature map generated by the feature map generation unit down to one numeric value in the at least one corresponding combined feature map.

CROSS-REFERENCE TO RELATED APPLICATION

The present patent application is a continuation of U.S. patentapplication Ser. No. 16/085,339, filed on Sep. 14, 2018 as a US NationalPhase Application of PCT International Patent Application No.PCT/162017051580, International Filing Date Mar. 17, 2017, which claimsthe benefit of U.S. Provisional Patent Application No. 62/309,682, filedon Mar. 17, 2016, of which is incorporated herein by reference.

FIELD OF THE INVENTION

The invention relates to data processing. More precisely, this inventionrelates to a method and system for processing a task with robustness tomissing input information.

BACKGROUND OF THE INVENTION

In medical image analysis, image processing such as image segmentationis an important task and is primordial to visualizing and quantifyingthe severity of the pathology in clinical practices. Multi-modalityimaging provides complementary information to discriminate specifictissues, anatomy and pathologies. However, manual segmentation is long,painstaking and subject to human variability. In the last decades,numerous segmentation approaches have been developed to automate medicalimage segmentation.

These methods can be grouped into two categories, multi-atlas andmodel-based.

The multi-atlas approaches estimate online intensity similaritiesbetween the subject being segmented and multi-atlases or images withexpert labels. These multi-atlas techniques have shown excellent resultsin structural segmentation when using non-linear registration [Iglesias,J. E., Sabuncu, M. R.: Multi-atlas segmentation of biomedical images: Asurvey. Medical image analysis 24(1), 205-219 (2015)]; when combinedwith non-local approaches they have proven effective in segmentingdiffuse and sparse pathologies (i.e., multiple sclerosis (MS) lesions[Guizard, N., Coupé, P., Fonov, V. S., Manjón, J. V., Arnold, D. L.,Collins, D. L.: Rotation-invariant multi-contrast non-local means for mslesion segmentation. Neurolmage: Clinical 8, 376-389 (2015)]) as well asmore complex multi-label pathology (i,e., Glioblastoma [Cordier, N.,Delingette, H., Ayache, N.: A patch-based approach for the segmentationof pathologies: Application to glioma labelling. IEEE Transactions onMedical Imaging PP(99), 1-1 (2016)]). Multi-atlas methods rely on imageintensity and spatial similarity, which can be difficult to be fullydescribed by the atlases and heavily dependent on the imagepre-processing.

Model-based approaches, in contrast, are typically trained offline toidentify a discriminative model of image intensity features. Thesefeatures can be predefined by the user (e.g., within random decisionforest (RDF) [Geremia, E., Menze, B. H., Ayache, N.: Spatially adaptiverandom forests pp. 1344-1347 (2013)]) or automatically extracted andlearned hierarchically directly from the images [Brosch, T., Yoo, Y.,Tang, L. Y. W., Li, D. K. B., Traboulsee, A., Tam, R.: Medical ImageComputing and Computer-Assisted Intervention—MICCAI 2015: 18thInternational Conference, Munich, Germany, Oct. 5-9, 2015, Proceedings,Part III, chap. Deep Convolutional Encoder Networks for MultipleSclerosis Lesion Segmentation, pp. 3-11. Springer InternationalPublishing, Cham (2015)].

Both strategies are typically optimized for a specific set ofmulti-modal images and usually require these modalities to be available.In clinical settings, image acquisition and patient artifacts, amongother hurdles, make it difficult to fully exploit all the modalities; assuch, it is common to have one or more modalities to be missing for agiven instance. This problem is not new, and the subject of missing dataanalysis has spawned an immense literature in statistics (e.g., [VanBuuren, S.: Flexible imputation of missing data. CRC press (2012)]). Inmedical imaging, a number of approaches have been proposed, some ofwhich require retraining a specific model with the missing modalities orsynthesizing them [Hofmann, M., Steinke, F., Scheel, V., Charpiat, G.,Farquhar, J., Aschoff, P., Brady, M., Schölkopf, B., Pichler, B. J.:MRI-based attenuation correction for PET/MRI: a novel approach combiningpattern recognition and atlas registration. Journal of Nuclear Medicine49(11), 1875-1883 (2008)]. Synthesis can improve multi-modalclassification by adding information of the missing modalities in thecontext of simple classifier (e.g., RDF) [Tulder, G., Bruijne, M.:Medical Image Computing and Computer-Assisted Intervention—MICCAI 2015:18th International Conference, Munich, Germany, Oct. 5-9, 2015,Proceedings, Part I, chap. Why Does Synthesized Data ImproveMulti-sequence Classification?, pp. 531-538. Springer InternationalPublishing, Cham (2015)]. Approaches to mimicking with partial featuresa classifier trained with a complete set of features have also beenproposed [Hor, S., Moradi, M.: Scandent tree: A random forest learningmethod for incomplete multimodal datasets. In: Medical Image Computingand Computer-Assisted Intervention—MICCAI 2015, pp. 694-701. Springer(2015)].

Typical convolutional neural network (CNN) architectures take amultiplane image as input and process it through a sequence ofconvolutional layers (followed by nonlinearities such as ReLU(·)≡max(0,·)), alternating with optional pooling layers, to yield a per-pixel orper-image output [Goodfellow, I., Bengio, Y., Courville, A.: DeepLearning (© 2016 The MIT Press). In such networks, every input plane isassumed to be present within a given instance: since the very firstconvolutional layer mixes input values coming from all planes, anymissing plane introduces a bias in the computation that the network isnot equipped to deal with.

There is therefore a need for a method and system that will overcome atleast one of the above-identified drawbacks.

Features of the invention will be apparent from review of thedisclosure, drawings and description of the invention below.

BRIEF SUMMARY OF THE INVENTION

According to a broad aspect, there is disclosed a unit for generating avector of at least one numeric value to be used for processing a task,the unit for generating a vector comprising a unit for generatingcombined feature maps, the unit for generating combined feature mapscomprising a feature map generating unit, the feature map generatingunit for receiving more than one modality and for generating more thanone corresponding feature map using more than one correspondingtransformation operating independently of each other; wherein thegenerating of each of the more than one corresponding feature map isperformed by applying a given corresponding transformation on a givencorresponding modality, wherein the more than one correspondingtransformation is generated following an initial training performed inaccordance with the processing task to be performed and a combining unitfor selecting and combining the corresponding more than one feature mapgenerated by the feature map generating unit in accordance with at leastone combining operation and for providing at least one correspondingcombined feature map; wherein the combining unit is operating inaccordance with the processing task to be performed and the combiningoperation reduces each corresponding numeric value of each of the morethan one feature map generated by the feature map generation unit downto one numeric value in the at least one corresponding combined featuremap; a second feature map generating unit, the second feature mapgenerating unit for receiving the at least one corresponding combinedfeature map from the unit for generating combined feature maps and forgenerating at least one final feature map using at least onecorresponding transformation; wherein the generating of the at least onefinal feature map is performed by applying each of the at least onecorresponding transformation on at least one of the at least onecorresponding feature map received from the unit for generating combinedfeature maps; wherein the at least one corresponding transformation isgenerated following an initial training performed in accordance with theprocessing task to be performed and a feature map processing unit forreceiving the generated at least one final feature map from the secondfeature map generating unit and for processing the generated at leastone final feature map to provide a generated vector of at least onenumeric value to be used for processing the task.

In accordance with an embodiment, the initial training is performedaccording to a pseudo-curriculum learning scheme wherein after a fewiterations where all modalities are presented, modalities are randomlydropped.

In accordance with an embodiment, each of the more than onecorresponding transformation comprises a machine learning model composedof at least a plurality of levels of non-linear operations.

In accordance with an embodiment, each of the more than onecorresponding transformation comprises more than one layer ofconvolutional neural networks followed by fully connected layers.

In accordance with an embodiment, each of the generated more than onecorresponding feature map is represented using one of a polynomial, aradial basis function, and a sigmoid kernel.

In accordance with an embodiment, the processing task to be performedcomprises an image processing task selected from a group consisting ofan image segmentation, an image classification, an image detection, apixel-wise classification and a detection of patches in images.

In accordance with an embodiment, each of the at least one correspondingtransformation of the second feature map generating unit comprises amachine learning model composed of at least one level of at least one ofa non-linear operation and a linear operation.

According to a broad aspect, there is disclosed a non-transitorycomputer-readable storage medium for storing computer-executableinstructions which, when executed, cause a processing device to performa method for processing a task, the method comprising providing a unitfor generating a vector of at least one numeric value to be used forprocessing a task, the unit for generating a vector of at least onenumeric value to be used for processing a task comprising a unit forgenerating combined feature maps, the unit for generating combinedfeature maps comprising a feature map generating unit, the feature mapgenerating unit for receiving more than one modality and for generatingmore than one corresponding feature map using more than onecorresponding transformation operating independently of each other;wherein the generating of each of the more than one correspondingfeature map is performed by applying a given correspondingtransformation on a given corresponding modality, wherein the more thanone corresponding transformation is generated following an initialtraining performed in accordance with the processing task to beperformed and a combining unit for selecting and combining thecorresponding more than one feature map generated by the feature mapgenerating unit in accordance with at least one combining operation andfor providing at least one corresponding combined feature map; whereinthe combining unit is operating in accordance with the processing taskto be performed and the combining operation reduces each correspondingnumeric value of each of the more than one feature map generated by thefeature map generation unit down to one numeric value in the at leastone corresponding combined feature map, a second feature map generatingunit, the second feature map generating unit for receiving the at leastone corresponding combined feature map from the unit for generatingcombined featured maps and for generating at least one final feature mapusing at least one corresponding transformation; wherein the generatingof the at least one final feature map is performed by applying each ofthe at least one corresponding transformation on at least one of the atleast one corresponding feature map received from the unit forgenerating combined feature maps; wherein the at least one correspondingtransformation is generated following an initial training performed inaccordance with the task to be performed, and a feature map processingunit for receiving the generated at least one final feature map from thesecond feature map generating unit and for processing the generated atleast one final feature map to provide a generated vector of at leastone numeric value to be used for processing the task; training the unitfor generating combined feature maps and the second feature mapgenerating unit using training data; providing at least one modality tothe unit for generating a vector of at least one numeric value to beused for processing a task; and obtaining a corresponding vector of atleast one numeric value.

According to a broad aspect, there is disclosed a non-transitorycomputer-readable storage medium for storing computer-executableinstructions which, when executed, cause a processing device to performa method for performing a task, the method comprising providing atrained unit for generating a vector of at least one numeric value to beused for processing a task, the unit for generating a vector of at leastone numeric value to be used for processing a task comprising a unit forgenerating combined feature maps, the unit for generating combinedfeature maps comprising a feature map generating unit, the feature mapgenerating unit for receiving more than one modality and for generatingmore than one corresponding feature map using more than onecorresponding transformation operating independently of each other;wherein the generating of each of the more than one correspondingfeature map is performed by applying a given correspondingtransformation on a given corresponding modality, wherein the more thanone corresponding transformation is generated following an initialtraining performed in accordance with the processing task to beperformed and a combining unit for selecting and combining thecorresponding more than one feature map generated by the feature mapgenerating unit in accordance with a t least one combining operation andfor providing at least one corresponding combined feature map; whereinthe combining unit is operating in accordance with the processing taskto be performed and the combining operation reduces each correspondingnumeric value of each of the more than one feature map generated by thefeature map generation unit down to one numeric value in the at leastone corresponding combined feature map, a second feature map generatingunit, the second feature map generating unit for receiving the at leastone corresponding combined feature map from the unit for generatingcombined feature maps and for generating at least one final feature mapusing at least one corresponding transformation; wherein the generatingof the at least one final feature map is performed by applying each ofthe at least one corresponding transformation on at least one of the atleast one corresponding feature map received from the unit forgenerating combined feature maps; wherein the at least one correspondingtransformation is generated following an initial training performed inaccordance with the task to be performed and a feature map processingunit for receiving the generated at least one final feature map from thesecond feature map generating unit and for processing the generated atleast one final feature map to provide a generated vector of at leastone numeric value to be used for processing the task; providing at leastone modality to the trained unit for generating a vector of at least onenumeric value to be used for processing the task; obtaining acorresponding vector of at least one numeric value.

According to a broad aspect, there is disclosed a processing devicecomprising a central processing unit; a display device; a communicationport for operatively connecting the processing device to a plurality ofmobile processing devices, each carried by a user; a memory unitcomprising an application for processing a task, the applicationcomprising instructions for providing a unit for generating a vector ofat least one numeric value to be used for processing a task, the unitfor generating a vector of at least one numeric value to be used forprocessing a task comprising a unit for generating combined featuremaps, the unit for generating combined feature maps comprising a featuremap generating unit, the feature map generating unit for receiving morethan one modality and for generating more than one corresponding featuremap using more than one corresponding transformation operatingindependently of each other; wherein the generating of each of the morethan one corresponding feature map is performed by applying a givencorresponding transformation on a given corresponding modality, whereinthe more than one corresponding transformation is generated following aninitial training performed in accordance with the processing task to beperformed and a combining unit for selecting and combining thecorresponding more than one feature map generated by the feature mapgenerating unit in accordance with at least one combining operation andfor providing at least one corresponding combined feature map; whereinthe combining unit is operating in accordance with the processing taskto be performed and the combining operation reduces each correspondingnumeric value of each of the more than one feature map generated by thefeature map generation unit down to one numeric value in the at leastone corresponding combined feature map, a second feature map generatingunit, the second feature map generating unit for receiving the at leastone corresponding combined feature map from the unit for generatingcombined feature maps and for generating at least one final feature mapusing at least one corresponding transformation; wherein the generatingof the at least one final feature map is performed by applying each ofthe at least one corresponding transformation on at least one of the atleast one corresponding feature map received from the unit forgenerating combined feature maps; wherein the at least one correspondingtransformation is generated following an initial training performed inaccordance with the task to be performed; and a feature map processingunit for receiving the generated at least one final feature map from thesecond feature map generating unit and for processing the generated atleast one final feature map to provide a generated a vector of at leastone numeric value to be used for processing the task; instructions fortraining the unit for generating combined feature maps and the secondfeature map generating unit using training data; instructions forproviding at least one modality to the unit for generating a vector ofat least one numeric value to be used for processing the task andinstructions for obtaining a corresponding vector of at least onenumeric value.

According to a broad aspect, there is disclosed a method for processinga plurality of modalities, wherein the processing is robust to anabsence of at least one modality, the method comprising receiving aplurality of modalities; processing each modality of the plurality ofmodalities using a respective transformation to generate a respectivefeature map comprising at least one corresponding numeric value, whereinthe respective transformation operates independently of each other,further wherein the respective transformation comprises a machinelearning model composed of at least a plurality of levels of non-linearoperations; processing the numeric values obtained using at least onecombining operation to generate at least one combined representation ofthe numeric values obtained, wherein the at least one combiningoperation comprises a computation that reduces each correspondingnumeric value of each of the more than one feature maps generated downto a numeric value in the at least one combined representation of thenumeric values obtained and processing the at least one combinedrepresentation of the numeric values obtained using a machine learningmodel composed of at least one level of at least one of a nonlinearoperation and a linear operation for performing the processing of theplurality of modalities.

According to a broad aspect, there is disclosed a unit for generatingcombined feature maps in accordance with a processing task to beperformed, the unit for generating combined feature maps comprising afeature map generating unit, the feature map generating unit forreceiving more than one modality and for generating more than onecorresponding feature map using more than one correspondingtransformation operating independently of each other; wherein thegenerating of each of the more than one corresponding feature map isperformed by applying a given corresponding transformation on a givencorresponding modality, wherein the more than one correspondingtransformation is generated following an initial training performed inaccordance with the processing task to be performed and a combining unitfor selecting and combining the corresponding more than one feature mapgenerated by the feature map generating unit in accordance with at leastone combining operation and for providing at least one correspondingcombined feature map; wherein the combining unit is operating inaccordance with the processing task to be performed and the combiningoperation reduces each corresponding numeric value of each of the morethan one feature map generated by the feature map generation unit downto one numeric value in the at least one corresponding combined featuremap.

According to one embodiment, the combining of the corresponding morethan one feature map generated by the feature map generating unit isperforming in accordance with more than one combining operation; whereineach combining operation is independent from one another.

According to a broad aspect, there is disclosed a segmentation unit forgenerating a segmentation mask of an image, the segmentation unitcomprising a first group of kernels comprising at least one layer ofkernels, each layer comprising more than one set of a plurality ofconvolution kernels to be trained; each set for receiving a specificmodality of the image and for generating a plurality of correspondingfeature maps; a combining unit for combining, for each convolutionkernel to be trained of the plurality of convolution kernels to betrained, each feature map generated by a given convolution kernel to betrained in each set of the more than one set a plurality of convolutionkernels to be trained to thereby provide a plurality of correspondingcombined feature maps; a second group of kernels comprising at least onelayer of kernels, each layer comprising a set of a plurality ofconvolution kernels to be trained; each set of a plurality ofconvolution kernels to be trained for receiving a corresponding combinedfeature map generated by the combining unit and for generating aplurality of feature maps and a feature map processing unit forreceiving the plurality of generated feature maps from the second groupof convolution kernels and for processing the plurality of generatedfeature maps to provide the segmentation mask of the image.

According to another broad aspect, there is disclosed a non-transitorycomputer-readable storage medium for storing computer-executableinstructions which, when executed, cause a processing device to performa method for segmenting an image, the method comprising providing asegmentation unit for generating a segmentation mask of an image, thesegmentation unit comprising a first group of convolution kernelscomprising at least one layer of convolution kernels, each layercomprising more than one set of a plurality of convolution kernels to betrained; each set for receiving a specific modality of the image and forgenerating a plurality of corresponding feature maps; a combining unitfor combining, for each convolution kernel to be trained of theplurality of convolution kernels to be trained, each feature mapgenerated by a given convolution kernel to be trained in each set of themore than one set a plurality of convolution kernels to be trained tothereby provide a plurality of corresponding combined feature maps; asecond group of convolution kernels comprising at least one layer ofconvolution kernels, each layer comprising a set of a plurality ofconvolution kernels to be trained; each set of a plurality ofconvolution kernels to be trained for receiving a corresponding combinedfeature map generated by the combining unit and for generating aplurality of feature maps; and a feature map processing unit forreceiving the plurality of generated feature maps from the second groupof convolution kernels and for processing the plurality of generatedfeature maps to provide the segmentation mask of the image; trainingeach convolution kernel using training data; providing at least onemodality of the image to segment to the segmentation and providing acorresponding segmentation mask of the image.

According to another broad aspect, there is disclosed a non-transitorycomputer-readable storage medium for storing computer-executableinstructions which, when executed, cause a processing device to performa method for segmenting an image, the method comprising providing atrained segmentation unit for generating a segmentation mask of animage, the segmentation unit comprising a first group of convolutionkernels comprising at least one layer of convolution kernels, each layercomprising more than one set of a plurality of convolution kernels; eachset for receiving a specific modality of the image and for generating aplurality of corresponding feature maps; a combining unit for combining,for each convolution kernel of the plurality of convolution kernels,each feature map generated by a given convolution kernel in each set ofthe more than one set a plurality of convolution kernels to therebyprovide a plurality of corresponding combined feature maps; a secondgroup of convolution kernels comprising at least one layer ofconvolution kernels, each layer comprising a set of a plurality ofconvolution kernels; each set of a plurality of convolution kernels forreceiving a corresponding combined feature map generated by thecombining unit and for generating a plurality of feature maps and afeature map processing unit for receiving the plurality of generatedfeature maps from the second group of convolution kernels and forprocessing the plurality of generated feature maps to provide thesegmentation mask of the image; providing at least one modality of theimage to segment to the segmentation and providing a correspondingsegmentation mask of the image.

According to another broad aspect, there is disclosed a processingdevice comprising a central processing unit; a display device; acommunication port for operatively connecting the processing device to aplurality of mobile processing devices, each carried by a user; a memoryunit comprising an application for performing a segmentation of animage, the application comprising instructions for providing asegmentation unit for generating a segmentation mask of an image, thesegmentation unit comprising a first group of convolution kernelscomprising at least one layer of convolution kernels, each layercomprising more than one set of a plurality of convolution kernels to betrained; each set for receiving a specific modality of the image and forgenerating a plurality of corresponding feature maps; a combining unitfor combining, for each convolution kernel to be trained of theplurality of convolution kernels to be trained, each feature mapgenerated by a given convolution kernel to be trained in each set of themore than one set a plurality of convolution kernels to be trained tothereby provide a plurality of corresponding combined feature maps; asecond group of convolution kernels comprising at least one layer ofconvolution kernels, each layer comprising a set of a plurality ofconvolution kernels to be trained; each set of a plurality ofconvolution kernels to be trained for receiving a corresponding combinedfeature map generated by the combining unit and for generating aplurality of feature maps and a feature map processing unit forreceiving the plurality of generated feature maps from the second groupof convolution kernels and for processing the plurality of generatedfeature maps to provide the segmentation mask of the image; instructionsfor training each convolution kernel of the segmentation unit usingtraining data; instructions for providing at least one modality of theimage to segment to the segmentation and instructions for providing acorresponding segmentation mask of the image.

An advantage of the method for processing a plurality of modalitiesdisclosed herein is that it is robust to any combinatorial subset ofavailable modalities provided as input without the need to learn acombinatorial number of imputation models.

Another advantage of the method for processing a plurality of modalitiesdisclosed herein is that it is robust to any subset of missingmodalities.

Another advantage of the method for processing a plurality of modalitiesdisclosed herein is that it takes advantage of several modalities, thatmay be instance varying.

Another advantage of the method for processing a plurality of modalitiesdisclosed herein is that it does not require a “least commondenominator” modality that absolutely must be present for everyinstance.

BRIEF DESCRIPTION OF THE DRAWINGS

In order that the invention may be readily understood, embodiments ofthe invention are illustrated by way of example in the accompanyingdrawings.

FIG. 1 is a flowchart that shows an embodiment of a method forsegmenting an image which is an embodiment of a method for processing atask, wherein the processing of the task comprises image segmentation.

FIG. 2 is a block diagram that shows a first embodiment of asegmentation unit used in a method for segmenting an image. It will beappreciated that the segmentation unit is an embodiment of a unit forgenerating a vector of at least one numeric value to be used forprocessing a task, wherein the processing of the task comprises imagesegmentation.

FIG. 3 is a block diagram that shows a second embodiment of asegmentation unit used in a method for segmenting an image.

FIG. 4 is a diagram that shows an embodiment of a processing device thatmay be used for implementing the method for processing a task whereinthe processing of the task comprises segmenting an image.

Further details of the invention and its advantages will be apparentfrom the detailed description included below.

DETAILED DESCRIPTION OF THE INVENTION

In the following description of the embodiments, references to theaccompanying drawings are by way of illustration of an example by whichthe invention may be practiced.

Terms

The term “invention” and the like mean “the one or more inventionsdisclosed in this application,” unless expressly specified otherwise.

The terms “an aspect,” “an embodiment,” “embodiment,” “embodiments,”“the embodiment,” “the embodiments,” “one or more embodiments,” “someembodiments,” “certain embodiments,” “one embodiment,” “anotherembodiment” and the like mean “one or more (but not all) embodiments ofthe disclosed invention(s),” unless expressly specified otherwise.

A reference to “another embodiment” or “another aspect” in describing anembodiment does not imply that the referenced embodiment is mutuallyexclusive with another embodiment (e.g., an embodiment described beforethe referenced embodiment), unless expressly specified otherwise.

The terms “including,” “comprising” and variations thereof mean“including but not limited to,” unless expressly specified otherwise.

The terms “a,” “an” and “the” mean “one or more,” unless expresslyspecified otherwise.

The term “plurality” means “two or more,” unless expressly specifiedotherwise.

The term “herein” means “in the present application, including anythingwhich may be incorporated by reference,” unless expressly specifiedotherwise.

The term “whereby” is used herein only to precede a clause or other setof words that express only the intended result, objective or consequenceof something that is previously and explicitly recited. Thus, when theterm “whereby” is used in a claim, the clause or other words that theterm “whereby” modifies do not establish specific further limitations ofthe claim or otherwise restricts the meaning or scope of the claim.

The term “e.g.” and like terms mean “for example,” and thus do not limitthe terms or phrases they explain. For example, in a sentence “thecomputer sends data (e.g., instructions, a data structure) over theInternet,” the term “e.g.” explains that “instructions” are an exampleof “data” that the computer may send over the Internet, and alsoexplains that “a data structure” is an example of “data” that thecomputer may send over the Internet. However, both “instructions” and “adata structure” are merely examples of “data,” and other things besides“instructions” and “a data structure” can be “data.”

The term “i.e.” and like terms mean “that is,” and thus limit the termsor phrases they explain.

The term “multimodal dataset” and like terms mean a dataset for whicheach instance is composed of data having different modalities (ortypes). For example, in medical imaging, a multimodal dataset consistsof having different imaging modalities simultaneously for each patientinstance, such as computed tomography (CT), ultrasound or various kindsof magnetic resonance (MR) images.

The term “processing task” means applying a trained machine learningmodel on a given set of data, wherein a machine learning task depends onthe nature of a learning “signal” or “feedback” available to a learningalgorithm during a training, on a set of modalities pertinent for thegiven set of data. Non limiting examples of “processing task” inhealthcare comprise image segmentation, image classification, pixel-wiseclassification, detection of patches in images, classification ofpatches in images, stratifying patients, identifying radiomic phenotyperelating to biodistribution, target occupancy, pharmacodynamics effects,tumor heterogeneity and predicting treatment response, from multiplemodalities.

The term “modality” means any of the various types of equipment orprobes used to acquire information, directly or indirectly, of relevantobject or phenomenon for the task to be performed. Non limiting examplesof “modality” in healthcare comprise radiography imaging, ultrasoundimaging, magnetic resonance imaging, genetic testing, pathology testingand biosensors.

The term “feature map” means the result of applying a function to atopologically arranged vector of numbers to obtain a vector ofcorresponding output numbers preserving a topology. Non limiting exampleof a “feature map” is the result of using a layer of convolutionalneural network mapping input features to hidden units to form newfeatures to be fed to the next layer of convolutional neural network.

The term “training” means the process of training a machine learningmodel providing a machine learning algorithm with a set of modalities tolearn from, wherein the set of modalities contains a target attribute,and further wherein the machine learning model finds patterns in the setof modalities that map input data attributes to a target or taskattribute. “Training” outputs a machine learning model that capturesthese patterns. Non limiting examples of “training” comprise supervisedtraining, unsupervised training and curriculum training specifically inthe context of non-convex training criteria.

The term “combining operation” means a calculation between numbers, fromzero or more input operands to an output values. Non limiting examplesof “combining operation” are arithmetic and higher arithmeticoperations.

Neither the Title nor the Abstract is to be taken as limiting in any wayas the scope of the disclosed invention(s). The title of the presentapplication and headings of sections provided in the present applicationare for convenience only, and are not to be taken as limiting thedisclosure in any way.

Numerous embodiments are described in the present application, and arepresented for illustrative purposes only. The described embodiments arenot, and are not intended to be, limiting in any sense. The presentlydisclosed invention(s) are widely applicable to numerous embodiments, asis readily apparent from the disclosure. One of ordinary skill in theart will recognize that the disclosed invention(s) may be practiced withvarious modifications and alterations, such as structural and logicalmodifications. Although particular features of the disclosedinvention(s) may be described with reference to one or more particularembodiments and/or drawings, it should be understood that such featuresare not limited to usage in the one or more particular embodiments ordrawings with reference to which they are described, unless expresslyspecified otherwise.

Now referring to FIG. 1, there is shown an embodiment of a method forsegmenting an image.

It will be appreciated by the skilled addressee that the segmenting ofan image is one embodiment of a processing task to be performed. In analternative embodiment, the image processing task is one of an imageclassification, an image detection, a pixel-wise classification and adetection of patches in images. In an alternative embodiment, theprocessing task to be performed comprises a treatment responseprediction from multiple modalities.

According to processing step 102, a segmentation unit is provided.

It will be appreciated that the segmentation unit is an embodiment of aunit for generating a vector of at least one numeric value to be usedfor processing a task, wherein the processing of the task comprisesimage segmentation.

It will be appreciated that the segmentation unit disclosed herein isused to segment images having any subset of modality, i.e. images havingincomplete multi-modal datasets.

In one embodiment, the images are medical images.

The skilled addressee will appreciate that various alternativeembodiments may be provided for the images.

More precisely and as further explained below, the segmentation unitdisclosed herein uses a deep learning framework to achieve the purposeof segmenting images having any subset of modality.

As disclosed below, each modality is initially processed by its ownconvolutional pipeline, independently of all others. After at least oneindependent stage, feature maps from all available modalities are mergedby computing map-wise statistics, such as the mean and the variance,whose expectation do not depend on the number of terms (i.e.,modalities) that are provided. After merging, the mean and variancefeature maps are concatenated and fed into a final set of convolutionalstages to obtain network output.

It will therefore be appreciated that, in the method disclosed herein,each modality contributes an independent term to the mean and variance;in contrast to a prior-art vanilla convolutional neural networkarchitecture, a missing modality does not throw the computation off: Themean and variance terms simply become estimated with wider standarderrors.

Now referring to FIG. 2, there is shown a first embodiment of asegmentation unit 199 for generating a segmentation mask of an image.

It will be appreciated that the first group of convolution kernels 200is an embodiment of a feature map generating unit. The feature mapgenerating unit is used for receiving more than one modality and forgenerating more than one corresponding feature map using more than onecorresponding transformation operating independently of each other. Itwill be appreciated that the generating of each of the more than onecorresponding feature map is performed by applying a given correspondingtransformation on a given corresponding modality. It will be furtherappreciated that the more than one corresponding transformation isgenerated following an initial training performed in accordance with theprocessing task to be performed. As further explained below, the initialtraining is performed according to a pseudo-curriculum learning schemewherein after a few iterations where all modalities are presented,modalities are randomly dropped.

More precisely, and still referring to FIG. 2, the segmentation unit 199comprises a first group of convolution kernels 200.

The first group of convolution kernels comprises at least one layer ofconvolution kernels 206.

It will be appreciated that each layer of convolution kernels comprisesmore than one set of a plurality of convolution kernels to be trained.

More precisely, each set of a plurality of convolution kernels to betrained is for receiving a specific modality of the image and forgenerating a plurality of corresponding feature maps.

In the embodiment of FIG. 2, the first group of convolution kernels 200comprises a first layer of convolution kernels 206.

The first layer of kernels 206 comprises a first set of convolutionkernels 216.

Still referring to FIG. 2, the first set of convolution kernels 216comprises convolution kernel 218, convolution kernel 220, . . . andconvolution kernel 222.

It will be appreciated that each of the convolution kernel 218, theconvolution kernel 220 and the convolution kernel 222 receives a givenmodality 210 of an image.

A corresponding plurality of feature maps are generated. More precisely,feature map 224 is the result of the convolution of the given modality210 of the image by the convolution kernel 218, while feature map 226 isthe result of the convolution of the given modality 210 of the image bythe convolution kernel 220, and feature map 228 is the result of theconvolution of the given modality 210 of the image by the convolutionkernel 222.

Similarly, feature map 236 is the result of the convolution of the givenmodality 212 of the image by the convolution kernel 230, while featuremap 238 is the result of the convolution of the given modality 212 ofthe image by the convolution kernel 232, and feature map 240 is theresult of the convolution of the given modality 212 of the image by theconvolution kernel 234.

The second modality of the image is therefore convolved individuallywith each of the convolution kernels 230, 232 and 234.

Similarly, feature map 248 is the result of the convolution of the givenmodality 214 of the image by the convolution kernel 242, while featuremap 250 is the result of the convolution of the given modality 214 ofthe image by the convolution kernel 244, and feature map 252 is theresult of the convolution of the given modality 214 of the image by theconvolution kernel 246.

The third modality of the image is therefore convolved individually witheach of the convolution kernels 242, 244 and 246.

At this point it should be appreciated that, while an embodiment hasbeen disclosed with three modalities of an image, the skilled addresseewill appreciate that any number of modalities greater than or equal totwo may be used.

It should also be appreciated that, while in one embodiment threemodalities of the image may be available, any combination of one or moremodality may be used as an input.

For instance, in one embodiment, only modality 210 is available. In analternative embodiment, only modalities 214 and 210 are available, etc.

Still referring to FIG. 2, it will be appreciated that the segmentationunit 199 further comprises a combining unit 202.

It will be appreciated that the combining unit 202 is an embodiment of acombining unit for selecting and combining the corresponding more thanone feature map generated by the feature map generating unit inaccordance with at least one combining operation and for providing atleast one corresponding combined feature map. Moreover, it will beappreciated that the combining unit is operating in accordance with theprocessing task to be performed and the combining operation reduces eachcorresponding numeric value of each of the more than one feature mapgenerated by the feature map generation unit down to one numeric valuein the at least one corresponding combined feature map.

It will be appreciated that in one embodiment, the combining of thecorresponding more than one feature map generated by the feature mapgenerating unit is performing in accordance with more than one combiningoperation. It will be appreciated that in one embodiment, wherein morethan one combining operation is used, each combining operation isindependent from one another.

More precisely and in the embodiment shown in FIG. 2, the combining unit202 is used for combining, for each convolution kernel to be trained ofthe plurality of convolution kernels to be trained, each feature mapgenerated by a given convolution kernel to be trained in each set of themore than one set of a plurality of convolution kernels to be trained tothereby provide a plurality of corresponding combined feature maps.

More precisely, in the combining unit 202, a feature map 260 isgenerated as a result of the combination of feature map 224 with featuremap 236 and with feature map 248.

In the combining unit 202, feature map 262 is generated as a result ofthe combination of feature map 226 with feature map 238 and with featuremap 250.

In the combining unit 202, feature map 264 is generated as a result ofthe combination of feature map 228 with feature map 240 and with featuremap 252.

It will be appreciated that the combination of the feature maps may beperformed according to various embodiments.

For instance the combination may be selected from a group consisting ofa computation of a mean, a computation of a variance and a computationof higher-order statistics such as the skewness or kurtosis, as well ascomputation of quantile statistics, as well as any computation thatreduces an unordered non-empty set of numbers down to one number. Infact and as mentioned above, the combination is performed using acombining operation which reduces each corresponding numeric value ofeach of the more than one feature map generated by the feature mapgeneration unit down to one numeric value in the at least onecorresponding combined feature map.

It will be appreciated by the skilled addressee that the purpose of thecombination is to create an abstraction layer.

In one embodiment, not shown in FIG. 2, two distinct combinations areperformed. A first combination performed is a mean while a secondcombination performed is a variance. Each distinct combination isresponsible for generating a corresponding feature map.

Still referring to FIG. 2, the segmentation unit 199 further comprises asecond group of convolution kernels 204.

The second group of convolution kernels 204 comprises at least one layerof convolution kernels.

It will be appreciated that the second group of convolution kernels isan embodiment of a second feature map generating unit. The secondfeature map generating unit is used for receiving the at least onecorresponding combined feature map from the unit for generating combinedfeature maps and for generating at least one final feature map using atleast one corresponding transformation. It will be further appreciatedthat the generating of the at least one final feature map is performedby applying each of the at least one corresponding transformation on atleast one of the at least one corresponding feature map received fromthe unit for generating combined feature maps. Moreover, it will beappreciated that the at least one corresponding transformation isgenerated following an initial training performed in accordance with theprocessing task to be performed.

More precisely and in the embodiment disclosed in FIG. 2, each layer ofconvolution kernels comprises a plurality of convolution kernels to betrained. Each convolution kernel to be trained is used for receiving acorresponding combined feature map generated by the combining unit andfor generating the segmentation mask of the image.

More precisely and in the embodiment shown in FIG. 2, the second groupof kernels comprises a single layer of convolution kernels 208.

The layer of convolution kernels 208 comprises convolution kernel 266,convolution kernel 268, . . . and convolution kernel 270.

It will be appreciated that a feature map is convolved with a givenkernel to generate a new feature map.

For instance, the feature map 260 is convolved with convolution kernel266 to generate feature map 272.

Similarly, the feature map 262 is convolved with convolution kernel 268to generate feature map 274.

The feature map 264 is convolved with convolution kernel 270 to generatefeature map 276.

It will be appreciated that the segmentation unit 199 further comprisesa feature map processing unit, not shown in FIG. 2.

The feature map processing unit is used for receiving the generated atleast one final feature map from the second feature map generating unitand for processing the generated at least one final feature map toprovide a generated vector of at least one numeric value to be used forprocessing the task.

In the embodiment disclosed in FIG. 2, the generated vector of at leastone numeric value comprises the segmentation mask of the image.

More precisely and in the embodiment disclosed in FIG. 2, the featuremap processing unit receives the feature map 272, the feature map 274and the feature map 276 and generates a corresponding segmentation mask.

It will be appreciated that the segmentation mask is generated using a“softmax” computation across the feature maps.

It will be appreciated by the skilled addressee that various alternativeembodiments may be provided.

Now referring to FIG. 3, there is shown a second embodiment of asegmentation unit 299 for generating a segmentation mask of an image.

In this embodiment, the segmentation unit 299 comprises a first group ofconvolution kernels 300, a combining unit 302, a second group ofconvolution kernels 304 and a feature map processing unit, not shown inFIG. 3.

The first group of convolution kernels 300 comprises two layers ofconvolution kernels, not shown, generating respectively more than oneset of a plurality of feature maps. A first set of feature maps 306 anda second set of feature maps 308 are disclosed in FIG. 3.

It will be appreciated that the two layers of convolution kernels arereferred to as respectively C_(k) ⁽¹⁾ and C_(k) ⁽²⁾.

The first set of feature maps 306 comprises feature maps that aregenerated following a convolution of a respectively a first modality ofan image 320, a second modality of an image 322, a nth modality of animage 324 by respectively a plurality of convolution kernels. In thisembodiment, each plurality of convolution kernels comprises 48convolution kernels, each having a size of (5,5).

The second set of feature maps 308 comprises feature maps that aregenerated following a convolution of each set of features maps with acorresponding set of convolution kernels. As outlined above, it will beappreciated that each convolution operation is followed by a ReLUoperation to produce the feature map.

This applies everywhere, except in the combining unit. In this instance,the max-pooling operation disclosed below follows the ReLU. Moreprecisely and in this embodiment, each plurality of convolution kernelscomprises 48 convolution kernels, each having a size of (5,5) and apooling (2,2) stride 1. It will be appreciated that a max-poolingoperation is applied to each feature map immediately after the ReLUoperation. This operation has a pooling window of 2×2, and a stride ofone in one embodiment. This means that all 2×2 regions in the featuremap are visited, and the maximum value within each region is taken,hence the name “max-pooling”, to yield one value per 2×2 region. Astride of “one” means that we move by one pixel, independently in thehorizontal and vertical directions, such that there are as many pixelsat output as there are at input. In addition, to ensure that the rightnumber of pixels is obtained, there is zero-padding at the edges aroundeach feature map. The purpose of this kind of max-pooling is tointroduce some robustness in the location of the features identified bythe convolution kernels.

The combining unit 302 comprises a first plurality of feature maps 310and a second plurality of feature maps 312.

The first plurality of feature maps 310 corresponds to an arithmeticaverage of the corresponding feature maps, while the second plurality offeature maps 312 comprises a variance of a plurality of incoming featuremaps.

More precisely, modality fusion is computed here, as first and secondmoments across available modalities in C⁽²⁾, separately for each featuremap I.

${{\hat{E}}_{l}\lbrack C^{(2)} \rbrack} = {\frac{1}{K}{\sum\limits_{k \in K}{{C_{k,l}^{(2)}{\lbrack\rbrack}}\frac{~}{}{\Sigma( {\,^{0}{\bigwedge{\lbrack\rbrack}}} )}}}}$

with {circumflex over (V)}ar_(l)[C⁽²⁾] defined to be zero if |K|=1.

The second group of kernels 304 comprises at least one layer ofconvolution kernels.

In the embodiment shown in FIG. 3, two layers of a plurality ofconvolution kernels are provided.

A first layer of a plurality of convolution kernels is used forgenerating a first plurality of feature maps 314.

The first layer of a plurality of convolution kernels comprises 16kernels having a size of (5,5). The skilled addressee will appreciatethat various alternative embodiments may be provided for the number ofthe convolution kernels as well as for the size of each convolutionkernel.

A second layer of a plurality of convolution kernels is used forgenerating the second plurality of feature maps 316.

It will be appreciated that the last layer of the second group ofkernels 304, i.e. the second layer of a plurality of a plurality ofconvolution kernels in this embodiment, comprises a number of kernelsequal to a number of classes. It will be appreciated that the number ofclasses represents the types of segments that we want to produce. In asimple case, two classes are provided, e.g., “tumour” and “non-tumour”.In more complex cases, we may have tumour subtypes that depend ontexture characteristics of the image, and those would correspond toadditional classes. It will be appreciated that in this embodiment thesize of each convolution kernel of the second layer of convolutionkernels is (21,21). The skilled addressee will appreciate that variousalternative embodiments may be provided for the size of the convolutionkernels.

More precisely and in the embodiment disclosed in FIG. 3, the secondgroup of kernels 304 combines the merged modalities to produce the finalmodel output.

All Ê[C⁽²⁾] and {circumflex over (V)}ar[C⁽²⁾] feature maps areconcatenated and are passed through a convolutional filter C⁽³⁾ withReLU activation, to finish with a final layer C⁽⁴⁾ that has as manyfeature maps as there are target segmentation classes.

In one embodiment, the pixelwise posterior class probabilities are givenby applying a softmax function across the C⁽⁴⁾ feature maps, and a fullsegmentation is obtained by taking the pixelwise most likely posteriorclass in the feature map processing unit.

According to processing step 104, the segmentation unit is trained.

It will be appreciated that the segmentation unit may be trainedaccording to various embodiments.

In one embodiment, the segmentation unit is trained using a backpropagation algorithm.

As it is known to the skilled addressee, many algorithms may be used totrain the segmentation unit.

In one embodiment, the training starts with easiest situations beforehaving to learn the difficult ones.

For instance the training is started with a pseudo-curriculum learningscheme where after a few iterations where all modalities are presentedto the segmentation unit, modalities are randomly dropped, ensuring ahigher probability of dropping zero or one modality only.

Typically a number of several hundreds to tens of thousands instancesmay be used to train the segmentation unit.

Still referring to FIG. 1 and according to processing step 106, thesegmentation unit is used. It will be appreciated that the segmentationunit may be used according to various embodiments.

In fact, it will be appreciated that the segmentation unit may be usedusing a set of at least one modality of an image.

It will be appreciated that the set of at least one modality of an imagemay be provided according to various embodiments.

Now referring to FIG. 4, there is shown an embodiment of a processingdevice for segmenting an image 400.

It will be appreciated that the processing device for segmenting animage 400 is an embodiment of a processing device for processing a taskwherein the processing of the task comprises segmenting an image.

The processing device for segmenting an image 400 comprises a centralprocessing unit 402, a display device 404, input devices 410,communication ports 406, a data bus 408, a memory unit 412 and agraphics processing unit (GPU) 422.

The central processing unit 402, the display device 404, the inputdevices 410, the communication ports 406, the memory unit 412 and thegraphics processing unit 422 are interconnected using the data bus 408.

The central processing unit 402 is used for processing computerinstructions. The skilled addressee will appreciate that variousembodiments of the central processing unit 402 may be provided.

In one embodiment, the central processing unit 402 is a CPU Core i7 CPUrunning at 3.4 GHz and manufactured by Intel™.

In one embodiment, the graphics processing unit 422 is a Titan X GPUmanufactured by Nvidia™.

The display device 404 is used for displaying data to a user. Theskilled addressee will appreciate that various types of display device404 may be used.

In one embodiment, the display device 404 is a standard liquid-crystaldisplay (LCD) monitor.

The communication ports 406 are used for sharing data with theprocessing device for segmenting an image 400.

The communication ports 406 may comprise, for instance, a universalserial bus (USB) port for connecting a keyboard and a mouse to theprocessing device for segmenting an image 400.

The communication ports 406 may further comprise a data networkcommunication port such as an IEEE 802.3 port for enabling a connectionof the processing device for segmenting an image 400 with anotherprocessing device via a data network, not shown.

The skilled addressee will appreciate that various alternativeembodiments of the communication ports 406 may be provided.

In one embodiment, the communication ports 406 comprise an Ethernet portand a mouse port (e.g., Logitech™).

The memory unit 412 is used for storing computer-executableinstructions.

It will be appreciated that the memory unit 412 comprises, in oneembodiment, a basic input/output system, also referred to as bios 414.

The memory unit 412 further comprises an operating system 416.

It will be appreciated by the skilled addressee that the operatingsystem 416 may be of various types.

In an embodiment, the operating system 416 is Linux Ubuntu operatingsystem version 15.10 or more recent.

The memory unit 412 further comprises an application for segmenting animage 418. It will be appreciated that the application for segmenting animage 418 is an embodiment of an application for processing a task,wherein the processing of the task comprises segmenting an image.

The memory unit 412 further comprises training data 420.

It will be appreciated that the training data 420 are used for traininga segmentation unit implemented in the application for segmenting animage 418.

In an alternative embodiment, the memory unit 412 does not comprise thetraining data 420. It will be appreciated that this is the case when thesegmentation unit has been already fully trained.

The application for segmenting an image 418 comprises instructions forproviding a segmentation unit for generating a segmentation mask of animage, the segmentation unit comprising a first group of convolutionkernels comprising at least one layer of convolution kernels, each layercomprising more than one set of a plurality of convolution kernels to betrained; each set for receiving a specific modality of the image and forgenerating a plurality of corresponding feature maps; a combining unitfor combining, for each convolution kernel to be trained of theplurality of convolution kernels to be trained, each feature mapgenerated by a given convolution kernel to be trained in each set of themore than one set a plurality of convolution kernels to be trained tothereby provide a plurality of corresponding combined feature maps; anda second group of convolution kernels comprising at least one layer ofconvolution kernels, each layer comprising a set of a plurality ofconvolution kernels to be trained; each set of a plurality ofconvolution kernels to be trained for receiving a corresponding combinedfeature map generated by the combining unit and for generating thesegmentation mask of the image.

In the case where the segmentation unit is not fully trained, theapplication for segmenting an image 418 comprises instructions fortraining each convolution kernels of the segmentation unit usingtraining data.

The application for segmenting an image 418 further comprisesinstructions for providing at least one modality of the image to segmentto the segmentation unit.

The application for segmenting an image 418 further comprisesinstructions for providing a corresponding segmentation mask of theimage to segment.

It will be appreciated that the application for segmenting an image 418is an embodiment of an application for processing a task. Theapplication for processing a task comprises instructions for providing aunit for generating a vector of at least one numeric value to be usedfor processing a task, the unit for generating a vector of at least onenumeric value to be used for processing a task comprising a feature mapgenerating unit, the feature map generating unit for receiving more thanone modality and for generating more than one corresponding feature mapusing more than one corresponding transformation operating independentlyof each other; wherein the generating of each of the more than onecorresponding feature map is performed by applying a given correspondingtransformation on a given corresponding modality, wherein the more thanone corresponding transformation is generated following an initialtraining performed in accordance with the processing task to beperformed; and a combining unit for selecting and combining thecorresponding more than one feature map generated by the feature mapgenerating unit in accordance with at least one combining operation andfor providing at least one corresponding combined feature map; whereinthe combining unit is operating in accordance with the processing taskto be performed and the combining operation reduces each correspondingnumeric value of each of the more than one feature map generated by thefeature map generation unit down to one numeric value in the at leastone corresponding combined feature map, a second feature map generatingunit, the second feature map generating unit for receiving the at leastone corresponding combined feature map from the unit for generatingcombined feature maps and for generating at least one final feature mapusing at least one corresponding transformation; wherein the generatingof the at least one final feature map is performed by applying each ofthe at least one corresponding transformation on at least one of the atleast one corresponding feature map received from the unit forgenerating combined feature maps; wherein the at least one correspondingtransformation is generated following an initial training performed inaccordance with the task to be performed; and a feature map processingunit for receiving the generated at least one final feature map from thesecond feature map generating unit and for processing the generated atleast one final feature map to provide a generated a vector of at leastone numeric value to be used for processing the task. The applicationfor processing a task further comprises instructions for training theunit for generating combined feature maps and the second feature mapgenerating unit using training data. The application for processing atask further comprises instructions for providing at least one modalityto the unit for generating a vector of at least one numeric value to beused for processing the task and instructions for obtaining acorresponding vector of at least one numeric value.

It will be appreciated that a non-transitory computer-readable storagemedium is also disclosed for storing computer-executable instructionswhich, when executed, cause a processing device to perform a method forsegmenting an image, the method comprising providing a trainedsegmentation unit for generating a segmentation mask of an image, thesegmentation unit comprising a first group of convolution kernelscomprising at least one layer of convolution kernels, each layercomprising more than one set of a plurality of convolution kernels; eachset for receiving a specific modality of the image and for generating aplurality of corresponding feature maps; a combining unit for combining,for each convolution kernel of the plurality of convolution kernels,each feature map generated by a given convolution kernel in each set ofthe more than one set a plurality of convolution kernels to therebyprovide a plurality of corresponding combined feature maps; and a secondgroup of convolution kernels comprising at least one layer ofconvolution kernels, each layer comprising a set of a plurality ofconvolution kernels; each set of a plurality of convolution kernels forreceiving a corresponding combined feature map generated by thecombining unit and for generating the segmentation mask of the image;providing at least one modality of the image to segment to thesegmentation and providing a corresponding segmentation mask of theimage.

It will be appreciated that a non-transitory computer-readable storagemedium is also disclosed for storing computer-executable instructionswhich, when executed, cause a processing device to perform a method forprocessing a task, the method comprising providing a unit for generatinga vector of at least one numeric value to be used for processing a task,the unit for generating a vector of at least one numeric value to beused for processing a task comprising a unit for generating combinedfeature maps comprising a feature map generating unit, the feature mapgenerating unit for receiving more than one modality and for generatingmore than one corresponding feature map using more than onecorresponding transformation operating independently of each other;wherein the generating of each of the more than one correspondingfeature map is performed by applying a given correspondingtransformation on a given corresponding modality, wherein the more thanone corresponding transformation is generated following an initialtraining performed in accordance with the processing task to beperformed and a combining unit for selecting and combining thecorresponding more than one feature map generated by the feature mapgenerating unit in accordance with at least one combining operation andfor providing at least one corresponding combined feature map; whereinthe combining unit is operating in accordance with the processing taskto be performed and the combining operation reduces each correspondingnumeric value of each of the more than one feature map generated by thefeature map generation unit down to one numeric value in the at leastone corresponding combined feature map, a second feature map generatingunit, the second feature map generating unit for receiving the at leastone corresponding combined feature map from the unit for generatingcombined featured maps and for generating at least one final feature mapusing at least one corresponding transformation; wherein the generatingof the at least one final feature map is performed by applying each ofthe at least one corresponding transformation on at least one of the atleast one corresponding feature map received from the unit forgenerating combined feature maps; wherein the at least one correspondingtransformation is generated following an initial training performed inaccordance with the task to be performed, and a feature map processingunit for receiving the generated at least one final feature map from thesecond feature map generating unit and for processing the generated atleast one final feature map to provide a generated vector of at leastone numeric value to be used for processing the task; training the unitfor generating combined feature maps and the second feature mapgenerating unit using training data; providing at least one modality tothe unit for generating a vector of at least one numeric value to beused for processing a task and obtaining a corresponding vector of atleast one numeric value.

It will be appreciated that a non-transitory computer-readable storagemedium is also disclosed for storing computer-executable instructionswhich, when executed, cause a processing device to perform a method forsegmenting an image, the method comprising providing a segmentation unitfor generating a segmentation mask of an image, the segmentation unitcomprising a first group of convolution kernels comprising at least onelayer of convolution kernels, each layer comprising more than one set ofa plurality of convolution kernels to be trained; each set for receivinga specific modality of the image and for generating a plurality ofcorresponding feature maps; a combining unit for combining, for eachconvolution kernel to be trained of the plurality of convolution kernelsto be trained, each feature map generated by a given convolution kernelto be trained in each set of the more than one set a plurality ofconvolution kernels to be trained to thereby provide a plurality ofcorresponding combined feature maps; and a second group of convolutionkernels comprising at least one layer of convolution kernels, each layercomprising a set of a plurality of convolution kernels to be trained;each set of a plurality of convolution kernels to be trained forreceiving a corresponding combined feature map generated by thecombining unit and for generating the segmentation mask of the image;training each convolution kernels using training data; providing atleast one modality of the image to segment to the segmentation;providing a corresponding segmentation mask of the image.

It will be also appreciated that that a non-transitory computer-readablestorage medium is disclosed for storing computer-executable instructionswhich, when executed, cause a processing device to perform a method forperforming a task, the method comprising providing a trained unit forgenerating a vector of at least one numeric value to be used forprocessing a task, the unit for generating a vector of at least onenumeric value to be used for processing a task comprising: a unit forgenerating combined feature maps, the unit for generating combinedfeature maps comprising a feature map generating unit, the feature mapgenerating unit for receiving more than one modality and for generatingmore than one corresponding feature map using more than onecorresponding transformation operating independently of each other;wherein the generating of each of the more than one correspondingfeature map is performed by applying a given correspondingtransformation on a given corresponding modality, wherein the more thanone corresponding transformation is generated following an initialtraining performed in accordance with the processing task to beperformed and a combining unit for selecting and combining thecorresponding more than one feature map generated by the feature mapgenerating unit in accordance with at least one combining operation andfor providing at least one corresponding combined feature map; whereinthe combining unit is operating in accordance with the processing taskto be performed and the combining operation reduces each correspondingnumeric value of each of the more than one feature map generated by thefeature map generation unit down to one numeric value in the at leastone corresponding combined feature map, a second feature map generatingunit, the second feature map generating unit for receiving the at leastone corresponding combined feature map from the unit for generatingcombined feature maps and for generating at least one final feature mapusing at least one corresponding transformation; wherein the generatingof the at least one final feature map is performed by applying each ofthe at least one corresponding transformation on at least one of the atleast one corresponding feature map received from the unit forgenerating combined feature maps; wherein the at least one correspondingtransformation is generated following an initial training performed inaccordance with the task to be performed; and a feature map processingunit for receiving the generated at least one final feature map from thesecond feature map generating unit and for processing the generated atleast one final feature map to provide a generated vector of at leastone numeric value to be used for processing the task; providing at leastone modality to the trained unit for generating a vector of at least onenumeric value to be used for processing the task and obtaining acorresponding vector of at least one numeric value.

It will be appreciated that the segmentation unit disclosed hereinlearns, for each modality of an image, an embedding of the image into anabstraction layer space. In this latent space, arithmetic operations,such as computing first and second moments, are well defined and can betaken over the different modalities available at inference time. Thishigher level features space can then be further processed to estimatethe segmentation.

A method for processing a plurality of modalities is also disclosed. Inthis method, the processing is robust to an absence of at least onemodality. The method comprises receiving a plurality of modalities. Themethod further comprises processing each modality of the plurality ofmodalities using a respective transformation to generate a respectivefeature map comprising at least one corresponding numeric value, whereinthe respective transformation operates independently of each other,further wherein the respective transformation comprises a machinelearning model composed of at least a plurality of levels of non-linearoperations. The method further comprises processing the numeric valuesobtained using at least one combining operation to generate at least onecombined representation of the numeric values obtained, wherein the atleast one combining operation comprises a computation that reduces eachcorresponding numeric value of each of the more than one feature mapsgenerated down to a numeric value in the at least one combinedrepresentation of the numeric values obtained. Finally, the methodcomprises processing the at least one combined representation of thenumeric values obtained using a machine learning model composed of atleast one level of at least one of a nonlinear operation and a linearoperation for performing the processing of the plurality of modalities.

An advantage of the method for processing a task disclosed herein isthat it is robust to any combinatorial subset of available modalitiesprovided as input, without the need to learn a combinatorial number ofimputation models.

Although the above description relates to a specific preferredembodiment as presently contemplated by the inventor, it will beunderstood that the invention in its broad aspect includes functionalequivalents of the elements described herein.

-   Clause 1. A unit for generating a vector of at least one numeric    value to be used for processing a task, the unit for generating a    vector comprising:

a unit for generating combined feature maps, the unit for generatingcombined feature maps comprising a feature map generating unit, thefeature map generating unit for receiving more than one modality and forgenerating more than one corresponding feature map using more than onecorresponding transformation operating independently of each other;wherein the generating of each of the more than one correspondingfeature map is performed by applying a given correspondingtransformation on a given corresponding modality, wherein the more thanone corresponding transformation is generated following an initialtraining performed in accordance with the processing task to beperformed and a combining unit for selecting and combining thecorresponding more than one feature map generated by the feature mapgenerating unit in accordance with at least one combining operation andfor providing at least one corresponding combined feature map; whereinthe combining unit is operating in accordance with the processing taskto be performed and the combining operation reduces each correspondingnumeric value of each of the more than one feature map generated by thefeature map generation unit down to one numeric value in the at leastone corresponding combined feature map;

a second feature map generating unit, the second feature map generatingunit for receiving the at least one corresponding combined feature mapfrom the unit for generating combined feature maps and for generating atleast one final feature map using at least one correspondingtransformation; wherein the generating of the at least one final featuremap is performed by applying each of the at least one correspondingtransformation on at least one of the at least one corresponding featuremap received from the unit for generating combined feature maps; whereinthe at least one corresponding transformation is generated following aninitial training performed in accordance with the processing task to beperformed; and

a feature map processing unit for receiving the generated at least onefinal feature map from the second feature map generating unit and forprocessing the generated at least one final feature map to provide agenerated vector of at least one numeric value to be used for processingthe task.

-   Clause 2. The unit for generating combined feature maps as claimed    in clause 1, wherein the initial training is performed according to    a pseudo-curriculum learning scheme wherein after a few iterations    where all modalities are presented, modalities are randomly dropped.-   Clause 3. The unit for generating combined feature maps as claimed    in clause 1, wherein each of the more than one corresponding    transformation comprises a machine learning model composed of at    least a plurality of levels of non-linear operations.-   Clause 4. The unit for generating combined feature maps as claimed    in clause 1, wherein each of the more than one corresponding    transformation comprises more than one layer of convolutional neural    networks followed by fully connected layers.-   Clause 5. The unit for generating combined feature maps as claimed    in clause 1, wherein each of the generated more than one    corresponding feature map is represented using one of a polynomial,    a radial basis function, and a sigmoid kernel.-   Clause 6. The unit for generating combined feature maps as claimed    in clause 1, wherein the processing task to be performed comprises    an image processing task selected from a group consisting of an    image segmentation, an image classification, an image detection, a    pixel-wise classification and a detection of patches in images.-   Clause 7. The unit for generating a vector of at least one numeric    value to be used for processing a task as claimed in clause 1,    wherein each of the at least one corresponding transformation of the    second feature map generating unit comprises a machine learning    model composed of at least one level of at least one of a non-linear    operation and a linear operation.-   Clause 8. A non-transitory computer-readable storage medium for    storing computer-executable instructions which, when executed, cause    a processing device to perform a method for processing a task, the    method comprising:

providing a unit for generating a vector of at least one numeric valueto be used for processing a task, the unit for generating a vector of atleast one numeric value to be used for processing a task comprising:

-   -   a unit for generating combined feature maps, the unit for        generating combined feature maps comprising a feature map        generating unit, the feature map generating unit for receiving        more than one modality and for generating more than one        corresponding feature map using more than one corresponding        transformation operating independently of each other; wherein        the generating of each of the more than one corresponding        feature map is performed by applying a given corresponding        transformation on a given corresponding modality, wherein the        more than one corresponding transformation is generated        following an initial training performed in accordance with the        processing task to be performed and a combining unit for        selecting and combining the corresponding more than one feature        map generated by the feature map generating unit in accordance        with at least one combining operation and for providing at least        one corresponding combined feature map; wherein the combining        unit is operating in accordance with the processing task to be        performed and the combining operation reduces each corresponding        numeric value of each of the more than one feature map generated        by the feature map generation unit down to one numeric value in        the at least one corresponding combined feature map,    -   a second feature map generating unit, the second feature map        generating unit for receiving the at least one corresponding        combined feature map from the unit for generating combined        featured maps and for generating at least one final feature map        using at least one corresponding transformation; wherein the        generating of the at least one final feature map is performed by        applying each of the at least one corresponding transformation        on at least one of the at least one corresponding feature map        received from the unit for generating combined feature maps;        wherein the at least one corresponding transformation is        generated following an initial training performed in accordance        with the task to be performed, and    -   a feature map processing unit for receiving the generated at        least one final feature map from the second feature map        generating unit and for processing the generated at least one        final feature map to provide a generated vector of at least one        numeric value to be used for processing the task;

training the unit for generating combined feature maps and the secondfeature map generating unit using training data;

providing at least one modality to the unit for generating a vector ofat least one numeric value to be used for processing a task; and

obtaining a corresponding vector of at least one numeric value.

-   Clause 9. A non-transitory computer-readable storage medium for    storing computer-executable instructions which, when executed, cause    a processing device to perform a method for performing a task, the    method comprising:

providing a trained unit for generating a vector of at least one numericvalue to be used for processing a task, the unit for generating a vectorof at least one numeric value to be used for processing a taskcomprising:

-   -   a unit for generating combined feature maps, the unit for        generating combined feature maps comprising a feature map        generating unit, the feature map generating unit for receiving        more than one modality and for generating more than one        corresponding feature map using more than one corresponding        transformation operating independently of each other; wherein        the generating of each of the more than one corresponding        feature map is performed by applying a given corresponding        transformation on a given corresponding modality, wherein the        more than one corresponding transformation is generated        following an initial training performed in accordance with the        processing task to be performed and a combining unit for        selecting and combining the corresponding more than one feature        map generated by the feature map generating unit in accordance        with at least one combining operation and for providing at least        one corresponding combined feature map; wherein the combining        unit is operating in accordance with the processing task to be        performed and the combining operation reduces each corresponding        numeric value of each of the more than one feature map generated        by the feature map generation unit down to one numeric value in        the at least one corresponding combined feature map,    -   a second feature map generating unit, the second feature map        generating unit for receiving the at least one corresponding        combined feature map from the unit for generating combined        feature maps and for generating at least one final feature map        using at least one corresponding transformation; wherein the        generating of the at least one final feature map is performed by        applying each of the at least one corresponding transformation        on at least one of the at least one corresponding feature map        received from the unit for generating combined feature maps;        wherein the at least one corresponding transformation is        generated following an initial training performed in accordance        with the task to be performed; and    -   a feature map processing unit for receiving the generated at        least one final feature map from the second feature map        generating unit and for processing the generated at least one        final feature map to provide a generated vector of at least one        numeric value to be used for processing the task;

providing at least one modality to the trained unit for generating avector of at least one numeric value to be used for processing the task;

obtaining a corresponding vector of at least one numeric value.

-   Clause 10. A processing device comprising:

a central processing unit;

a display device;

a communication port for operatively connecting the processing device toa plurality of mobile processing devices, each carried by a user;

a memory unit comprising an application for processing a task, theapplication comprising:

-   -   instructions for providing a unit for generating a vector of at        least one numeric value to be used for processing a task, the        unit for generating a vector of at least one numeric value to be        used for processing a task comprising a unit for generating        combined feature maps, the unit for generating combined feature        maps comprising a feature map generating unit, the feature map        generating unit for receiving more than one modality and for        generating more than one corresponding feature map using more        than one corresponding transformation operating independently of        each other; wherein the generating of each of the more than one        corresponding feature map is performed by applying a given        corresponding transformation on a given corresponding modality,        wherein the more than one corresponding transformation is        generated following an initial training performed in accordance        with the processing task to be performed and a combining unit        for selecting and combining the corresponding more than one        feature map generated by the feature map generating unit in        accordance with at least one combining operation and for        providing at least one corresponding combined feature map;        wherein the combining unit is operating in accordance with the        processing task to be performed and the combining operation        reduces each corresponding numeric value of each of the more        than one feature map generated by the feature map generation        unit down to one numeric value in the at least one corresponding        combined feature map, a second feature map generating unit, the        second feature map generating unit for receiving the at least        one corresponding combined feature map from the unit for        generating combined feature maps and for generating at least one        final feature map using at least one corresponding        transformation; wherein the generating of the at least one final        feature map is performed by applying each of the at least one        corresponding transformation on at least one of the at least one        corresponding feature map received from the unit for generating        combined feature maps; wherein the at least one corresponding        transformation is generated following an initial training        performed in accordance with the task to be performed; and a        feature map processing unit for receiving the generated at least        one final feature map from the second feature map generating        unit and for processing the generated at least one final feature        map to provide a generated a vector of at least one numeric        value to be used for processing the task;    -   instructions for training the unit for generating combined        feature maps and the second feature map generating unit using        training data;    -   instructions for providing at least one modality to the unit for        generating a vector of at least one numeric value to be used for        processing the task; and    -   instructions for obtaining a corresponding vector of at least        one numeric value.

-   Clause 11. A method for processing a plurality of modalities,    wherein the processing is robust to an absence of at least one    modality, the method comprising:

receiving a plurality of modalities;

processing each modality of the plurality of modalities using arespective transformation to generate a respective feature mapcomprising at least one corresponding numeric value, wherein therespective transformation operates independently of each other, furtherwherein the respective transformation comprises a machine learning modelcomposed of at least a plurality of levels of non-linear operations;

processing the numeric values obtained using at least one combiningoperation to generate at least one combined representation of thenumeric values obtained, wherein the at least one combining operationcomprises a computation that reduces each corresponding numeric value ofeach of the more than one feature maps generated down to a numeric valuein the at least one combined representation of the numeric valuesobtained; and

processing the at least one combined representation of the numericvalues obtained using a machine learning model composed of at least onelevel of at least one of a nonlinear operation and a linear operationfor performing the processing of the plurality of modalities.

-   Clause 12. A unit for generating combined feature maps in accordance    with a processing task to be performed, the unit for generating    combined feature maps comprising:

a feature map generating unit, the feature map generating unit forreceiving more than one modality and for generating more than onecorresponding feature map using more than one correspondingtransformation operating independently of each other; wherein thegenerating of each of the more than one corresponding feature map isperformed by applying a given corresponding transformation on a givencorresponding modality, wherein the more than one correspondingtransformation is generated following an initial training performed inaccordance with the processing task to be performed; and

a combining unit for selecting and combining the corresponding more thanone feature map generated by the feature map generating unit inaccordance with at least one combining operation and for providing atleast one corresponding combined feature map; wherein the combining unitis operating in accordance with the processing task to be performed andthe combining operation reduces each corresponding numeric value of eachof the more than one feature map generated by the feature map generationunit down to one numeric value in the at least one correspondingcombined feature map.

-   Clause 13. The unit for generating combined feature maps as claimed    in clause 1, wherein the combining of the corresponding more than    one feature map generated by the feature map generating unit is    performing in accordance with more than one combining operation;    wherein each combining operation is independent from one another.

REFERENCES

-   1. Bengio, Y., Louradour, J., Collobert, R., Weston, J.: Curriculum    learning. In: Proceedings of the 26th annual international    conference on machine learning. pp. 41-48. ACM (2009)-   2. Brosch, T., Yoo, Y., Tang, L. Y. W., Li, D. K. B., Traboulsee,    A., Tam, R.: Medical Image Computing and Computer-Assisted    Intervention—MICCAI 2015: 18th International Conference, Munich,    Germany, Oct. 5-9, 2015, Proceedings, Part III, chap. Deep    Convolutional Encoder Networks for Multiple Sclerosis Lesion    Segmentation, pp. 3-11. Springer International Publishing, Cham    (2015)-   3. Chollet, F.: keras. https://github.com/fchollet/keras (2015)-   4. Cordier, N., Delingette, H., Ayache, N.: A patch-based approach    for the segmentation of pathologies: Application to glioma    labelling. IEEE Transactions on Medical Imaging PP(99), 1-1 (2016)-   5. Geremia, E., Menze, B. H., Ayache, N.: Spatially adaptive random    forests pp. 1344-1347 (2013)-   6. Goodfellow, I., Bengio, Y., Courville, A.: Deep learning (2016),    http://goodfeli.github.io/dlbook/, book in preparation for MIT Press-   7. Guizard, N., Coupé, P., Fonov, V. S., Manjón, J. V., Arnold, D.    L., Collins, D. L.: Rotation-invariant multi-contrast non-local    means for ms lesion segmentation. NeuroImage: Clinical 8, 376-389    (2015)-   8. Havaei, M., Davy, A., Warde-Farley, D., Biard, A., Courville, A.,    Bengio, Y., Pal, C., Jodoin, P. M., Larochelle, H.: Brain tumor    segmentation with deep neural networks. arXiv preprint    arXiv:1505.03540 (2015)-   9. Hofmann, M., Steinke, F., Scheel, V., Charpiat, G., Farquhar, J.,    Aschoff, P., Brady, M., Scholkopf, B., Pichler, B. J.: MRI-based    attenuation correction for PET/MRI: a novel approach combining    pattern recognition and atlas registration. Journal of Nuclear    Medicine 49(11), 1875-1883 (2008)-   10. Hor, S., Moradi, M.: Scandent tree: A random forest learning    method for incomplete multimodal datasets. In: Medical Image    Computing and Computer-Assisted Intervention—MICCAI 2015, pp.    694-701. Springer (2015)-   11. Iglesias, J. E., Sabuncu, M. R.: Multi-atlas segmentation of    biomedical images: A survey. Medical image analysis 24(1), 205-219    (2015)-   12. Long, J., Shelhamer, E., Darrell, T.: Fully convolutional    networks for semantic segmentation. In: Proceedings of the IEEE    Conference on Computer Vision and Pattern Recognition. pp. 3431-3440    (2015)-   13. Menze, B., Jakab, A., Bauer, S., Kalpathy-Cramer, J., Farahani,    K., Kirby, J. e. a.: The multimodal brain tumor image segmentation    benchmark (brats). Medical Imaging, IEEE Transactions on 34(10),    1993-2024 (October 2015)-   14. Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., Dean, J.:    Distributed representations of words and phrases and their    compositionality. In: Advances in neural information processing    systems. pp. 3111-3119 (2013)-   15. Sled, J. G., Zijdenbos, A. P., Evans, A. C.: A nonparametric    method for automatic correction of intensity nonuniformity in mri    data. Medical Imaging, IEEE Trans-actions on 17(1), 87-97 (1998)-   16. Souplet, J., Lebrun, C., Ayache, N., Malandain, G.: An automatic    segmentation of T2-FLAIR multiple sclerosis lesions (07 2008)-   17. Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I.,    Salakhutdinov, R.: Dropout: A simple way to prevent neural networks    from overfitting. The Journal of Machine Learning Research 15(1),    1929-1958 (2014)-   18. Styner, M., Lee, J., Chin, B., Chin, M., Commowick, O., Tran,    H., Markovic-Plese, S., Jewells, V., Warfield, S.: 3d segmentation    in the clinic: A grand challenge ii: Ms lesion segmentation. MIDAS    Journal 2008, 1-6 (2008)-   19. Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the    importance of initialization and momentum in deep learning. In:    Proceedings of the 30th international conference on machine learning    (ICML-13). pp. 1139-1147 (2013)-   20. Sutskever, I., Vinyals, O., Le, Q. V.: Sequence to sequence    learning with neural networks. In: Advances in neural information    processing systems. pp. 3104-3112 (2014)-   21. Tulder, G., Bruijne, M.: Medical Image Computing and    Computer-Assisted Intervention—MICCAI 2015: 18th International    Conference, Munich, Germany, Oct. 5-9, 2015, Proceedings, Part I,    chap. Why Does Synthesized Data Improve Multi-sequence    Classification?, pp. 531-538. Springer International Publishing,    Cham (2015)-   22. Tustison, N. J., Shrinidhi, K., Wintermark, M., Durst, C. R.,    Kandel, B. M., Gee, J. C., Grossman, M. C., Avants, B. B.: Optimal    symmetric multimodal templates and concatenated random forests for    supervised brain tumor segmentation (simplified) with antsr.    Neuroinformatics 13(2), 209-225 (2015)-   23. Van Buuren, S.: Flexible imputation of missing data. CRC press    (2012)-   24. Vincent, P., Larochelle, H., Bengio, Y., Manzagol, P. A.:    Extracting and composing robust features with denoising    autoencoders. In: Proceedings of the 25th international conference    on Machine learning. pp. 1096-1103. ACM (2008)-   25. Xu, K., Ba, J., Kiros, R., Cho, K., Courville, A., Salakhudinov,    R., Zemel, R., Ben-gio, Y.: Show, attend and tell: Neural image    caption generation with visual attention. In: Blei, D., Bach, F.    (eds.) Proceedings of the 32nd International Conference on Machine    Learning (ICML-15). pp. 2048-2057. JMLR Workshop and Conference    Proceedings (2015), http://jmlr.org/proceedings/papers/v37/xuc15.pdf-   26. Zhao, L., Wu, W., Corso, J. J.: Medical Image Computing and    Computer-Assisted Intervention—MICCAI 2013: 16th International    Conference, Nagoya, Japan, Sep. 22-26, 2013, Proceedings, Part III,    chap. Semi-automatic Brain Tumor Segmentation by Constrained MRFs    Using Structural Trajectories, pp. 567-575. Springer Berlin    Heidelberg, Berlin, Heidelberg (2013),    http://dx.doi.org/10.1007/978-3-642-40760-4_71

1. A unit for generating a vector of at least one numeric value to beused for processing a task, the unit for generating a vector comprising:a unit for generating combined feature maps, the unit for generatingcombined feature maps comprising a feature map generating unit, thefeature map generating unit for receiving more than one modality and forgenerating more than one corresponding feature map using more than onecorresponding transformation operating independently of each other;wherein the generating of each of the more than one correspondingfeature map is performed by applying a given correspondingtransformation on a given corresponding modality, wherein the more thanone corresponding transformation is generated following an initialtraining performed in accordance with the processing task to beperformed and a combining unit for selecting and combining thecorresponding more than one feature map generated by the feature mapgenerating unit in accordance with at least one combining operation andfor providing at least one corresponding combined feature map; whereinthe combining unit is operating in accordance with the processing taskto be performed and the combining operation reduces each correspondingnumeric value of each of the more than one feature map generated by thefeature map generation unit down to one numeric value in the at leastone corresponding combined feature map; a second feature map generatingunit, the second feature map generating unit for receiving the at leastone corresponding combined feature map from the unit for generatingcombined feature maps and for generating at least one final feature mapusing at least one corresponding transformation; wherein the generatingof the at least one final feature map is performed by applying each ofthe at least one corresponding transformation on at least one of the atleast one corresponding feature map received from the unit forgenerating combined feature maps; wherein the at least one correspondingtransformation is generated following an initial training performed inaccordance with the processing task to be performed; and a feature mapprocessing unit for receiving the generated at least one final featuremap from the second feature map generating unit and for processing thegenerated at least one final feature map to provide a generated vectorof at least one numeric value to be used for processing the task.